-
Notifications
You must be signed in to change notification settings - Fork 4k
mmaprototype: extract analyzeFunc #158175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This commit moves analyzeFunc out of finishInit into its own helper improve readability.
This commit moves doneFunc out of analyzeFunc into its own helper function to improve readability.
This commit adds more comments for rangeAnalyzedConstraints.finishInit.
This commit moves diversity score computation out of analyzeFunc into its own helper function to improve readability.
|
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
tbg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM in that it doesn't seem to break anything and the comments are 💯x better. But I'd need to spend a few more hours on the constraints code that I haven't spent yet to really say that everything in the comments is right.
Better let @sumeerbhola have a pass too, if only to settle your remaining questions, which I can't help with right now.
@tbg reviewed 1 of 1 files at r1, 1 of 1 files at r2, 2 of 2 files at r3, 1 of 1 files at r4, 1 of 1 files at r5, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola)
pkg/kv/kvserver/allocator/mmaprototype/constraint.go line 1040 at r4 (raw file):
I feel like this would be easier to grok if this were framed as "we needed a function with certain properties. Here is a function". As is, from reading the comments, I have no idea what it does, and would probably need to look at tests.
But fundamentally, we are comparing slices and want to measure how "diverse" they are. This smells like a thing mathematicians have thought about before.
I asked Cursor about this and it said:
// diversityScore computes a normalized prefix-based distance metric between two
// locality tier sequences. This is mathematically equivalent to a normalized
// Longest Common Prefix (LCP) distance, which measures how far two hierarchical
// paths diverge in a tree structure.
//
// The metric is conceptually similar to:
// - Taxonomic distance in biology (measuring divergence in classification trees)
// - Tree path distance (normalized distance between two paths from root to leaf)
// - Normalized edit distance (simplified to only consider prefix differences)
//
// Given two sorted sequences of strings representing hierarchical locality tiers
// (e.g., [region, zone, rack]), the score is computed as:
// - If they differ at position i: (length - i) / length
// - If identical: 0 (minimum diversity, maximum similarity)
// - If no common prefix (i=0): 1.0 (maximum diversity, minimum similarity)
//
// The choice of 1.0 for maximum diversity (when sequences share no common prefix)
// is a practical normalization to avoid dealing with infinities while maintaining
// a bounded metric in [0, 1].
I don't know if this is exactly correct - we do a lot of summing things up and normalizing - but somehow mentioning that the base concept here is "length-mismatchPos/length" should give readers a pretty good grasp on what this "does".
pkg/kv/kvserver/allocator/mmaprototype/constraint.go line 705 at r3 (raw file):
// since we may populate satisfiedByReplica[nonVoterIndex][i] for voter // constraints as well. This just means that the store can satisfy // constraints[i] regardless of the replica type.
You kind of lost me here but in a good way, making me realize I don't actually know how this works. I've tried a few times to write my best understanding here but it always comes up garbled. I think the root of my confusion is: we have constraints and we have voter constraints. Yet, analyzedConstraints has no idea which it is (I think?). It knows which stores have voters and which ones have non-voters. It just computes which stores match which constraint and divvies this up by voter/non-voter status.
Let's say each constraint in the conjunction has a different set of voters satisfying it, and the numbers match up with the num_replicas in each constraint. Surely then the constraint, interpreted as a voter constraint, would be satisfied?
Some examples of what is and is not true in general would really help me grok this.
pkg/kv/kvserver/allocator/mmaprototype/constraint.go line 855 at r3 (raw file):
// analyzeFunc analyzes the current replica set and determines which constraints // replicas satisfy, populating ac.satisfiedByReplica and // ac.satisfiedNoConstraintReplica (both empty at the beginning). They are later
I think I saw that this method is invoked multiple times. Is the space just going to be re-used? Or am I misremembering the multiple calls thing?
pkg/kv/kvserver/allocator/mmaprototype/constraint.go line 873 at r5 (raw file):
// phase 3. A constraint may remain under-satisfied. If a store is // under-satisfied in phase 2, it cannot be corrected in phase 3 and will // continue to be under-satisfied. (TODO: Is this right during review)
I wish I could tell you. This is currently all out of my league.
pkg/kv/kvserver/allocator/mmaprototype/constraint.go line 1002 at r4 (raw file):
} // diversityFunc computes the diversity score between two sets of stores.
Does this have any meaningful testing? Might be a good one for @angeladietz's queue otherwise.
sumeerbhola
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of my comments are about code comments, and just a result of applying the principle stated in one of the comments (a) data-structure comments belong where the data-structure is declared, (b) algorithmic comments belong inside functions, (c) function declaration comments are about input and output, without explaining the intricacies of the input and output if those happen to be data-structures that have been explained where they are declared.
So they should be mostly addressable by moving the code comments and making sure they are not consistent with the existing code comments.
@sumeerbhola reviewed 1 of 1 files at r1, 1 of 1 files at r2, 2 of 2 files at r3, 1 of 1 files at r4.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @angeladietz and @wenyihu6)
pkg/kv/kvserver/allocator/mmaprototype/constraint.go line 1012 at r4 (raw file):
// is needed. For example, given two sets of stores [1, 2, 3] and [1, 2, 3], // diversity score should be computed among pairs (1, 2), (1, 3), (2, 3) only. func diversityFunc(
diversityOfTwoStoreSets?
pkg/kv/kvserver/allocator/mmaprototype/constraint.go line 1057 at r4 (raw file):
// (voter, voter), (nonvoter, nonvoter), (voter, nonvoter) totalSum, totalSamples := 0.0, 0 vs, ns := diversityFunc(replicas[voterIndex], replicas[voterIndex], true)
why is this being called again?
It used to be called 3 times: voter x voter; non-voter x non-voter; voter x non-voter.
Because we can use the first call in two places. Just repeating the call is very confusing and unnecessary.
pkg/kv/kvserver/allocator/mmaprototype/constraint.go line 833 at r1 (raw file):
} func (ac *analyzedConstraints) analyzeFunc(
nit: initialize would be a better method name.
pkg/kv/kvserver/allocator/mmaprototype/constraint.go line 853 at r3 (raw file):
} // analyzeFunc analyzes the current replica set and determines which constraints
as mentioned before, we can call this the initialization function and pass []internedConstraintsConjunction as another parameter (instead of having the caller set is immediately prior to calling this function). Then one doesn't need to talk about which fields are initialized and what their previous value was.
pkg/kv/kvserver/allocator/mmaprototype/constraint.go line 862 at r3 (raw file):
// For stores that satisfy no constraint, // ac.satisfiedNoConstraintReplica[kind][i] will be populated with the replica // storeIDs that satisfy no constraint.
IMHO, we should try to restrict method declaration comments to what it does without going into the meaning of data-structures.
For example, what satisfiedNoConstraintReplica will be populated with belongs in that data-structure comment. Otherwise we end up with repetitive comments (which can't be kept consistent).
At least for me, I find looking at the data-structure comments as a good starting point for getting into a piece of code -- that's where the semantics of things are explained.
pkg/kv/kvserver/allocator/mmaprototype/constraint.go line 866 at r3 (raw file):
// For stores that satisfy at least one constraint, // ac.satisfiedByReplica[kind][i] is filled with the storeIDs of replicas that // satisfy ac.constraints[i]. Each store should appear exactly once in the 2D
This is another data-structure comment.
pkg/kv/kvserver/allocator/mmaprototype/constraint.go line 869 at r3 (raw file):
// slice ac.satisfiedByReplica[kind]. A constraint can end up over-satisfied // (len(ac.satisfiedByReplica[kind][i]) > ac.constraints[i].numReplicas) in // phase 3. Phase 2 below attempts to correct this by prioritizing
This is the first mention of a phase. Can this be laid out in the format.
// The algorithm proceeds in phased.
// In phase 1, ...
// In phase 2, ...
// ...
pkg/kv/kvserver/allocator/mmaprototype/constraint.go line 870 at r3 (raw file):
// (len(ac.satisfiedByReplica[kind][i]) > ac.constraints[i].numReplicas) in // phase 3. Phase 2 below attempts to correct this by prioritizing // under-satisfied constraints and assigning stores to those first.
How can phase 2 correct the work in phase 3 if it precedes it?
pkg/kv/kvserver/allocator/mmaprototype/constraint.go line 875 at r3 (raw file):
// constraints, but we still populate both voter and non-voter indices // regardless because we need to know which constraints are satisfied by all // replicas to determine when promotions or demotions should occur.
another data-structure comment that should be moved to where satisfiedByReplica is declared.
pkg/kv/kvserver/allocator/mmaprototype/constraint.go line 877 at r3 (raw file):
// replicas to determine when promotions or demotions should occur. // // It has three phases: 1. For each replica (voter and non-voter), determine
At this point in the code, I think a sentence summarizing what the stage is doing without referring to actual fields is better. Foe example, buf.replicaConstraintIndices[kind][i] will be populated is too much detail since we aren't at that point in the code yet. The more detailed comment can precede the actual code for each phase.
pkg/kv/kvserver/allocator/mmaprototype/constraint.go line 884 at r3 (raw file):
// // buf.replicaConstraintIndices[kind][i] will be populated with the constraint // indices that the i+1-th voter/non-voter satisfies. This serves as a scratch
nit: voter/non-voter can be replaced by replica.
pkg/kv/kvserver/allocator/mmaprototype/constraint.go line 902 at r3 (raw file):
// satisfies both c1(num=1) and c2(num=1), we should prefer assigning s2 to c2 // rather than consuming it again for c1. TODO(wenyihu6): Verify behavior in // cases like: s1 satisfies c1(num=1) and c2(num=1), s2 satisfies only c1, and
This second example should work, because we assign the places for stores that only satisfy one constraint first. But there are probably examples one can construct where things don't work out well.
IIUC, operators tend to use non-overlapping constraints (except for the empty conjunction), for which this works well enough.
pkg/kv/kvserver/allocator/mmaprototype/constraint.go line 1003 at r3 (raw file):
// finishInit analyzes the span config and the range’s current replica set. It // prepares the MMA to compute lease-transfer and rebalancing candidates while
I am not keen on "It prepares the MMA ...". This is the purpose for which rangeAnalyzedConstraints exists -- it belongs where rangeAnalyzedConstraints is declared.
pkg/kv/kvserver/allocator/mmaprototype/constraint.go line 1446 at r3 (raw file):
// constraints. We still populate both voter and non-voter indexes because // we need to know which constraints are satisfied by all replicas to // determine when promotions or demotions should occur.
This is a good comment, and exactly where it belongs.
pkg/kv/kvserver/allocator/mmaprototype/constraint.go line 871 at r5 (raw file):
// A constraint may end up being over-satisfied // (len(ac.satisfiedByReplica[kind][i]) > ac.constraints[i].numReplicas) in // phase 3. A constraint may remain under-satisfied. If a store is
If a constraint is under-satisfied ...
pkg/kv/kvserver/allocator/mmaprototype/constraint.go line 873 at r5 (raw file):
// phase 3. A constraint may remain under-satisfied. If a store is // under-satisfied in phase 2, it cannot be corrected in phase 3 and will // continue to be under-satisfied. (TODO: Is this right during review)
Yes, it will stay under-satisfied because we attempted to not over-satisfy and went through each store, so the remaining stores are ones that can only over-satisfy.
This may be subject to change when we alter the logic in doneFunc.
pkg/kv/kvserver/allocator/mmaprototype/constraint.go line 835 at r2 (raw file):
// isConstraintSatisfied checks if the given constraint index has been fully // satisfied by the stores currently assigned to it. // TODO(wenyihu6): voter constraint should only count voters here (#158109)
this is only used during initialization, iirc. I don't think this deserves its own method.
Consider lifting it to a closure outside the loop.
pkg/kv/kvserver/allocator/mmaprototype/constraint.go line 894 at r2 (raw file):
append(ac.satisfiedByReplica[kind][j], buf.replicas[kind][i].StoreID) buf.replicaConstraintIndices[kind][i] = constraintIndices[:0] satisfied = ac.isConstraintSatisfied(j)
breaking out of nested loops using a "done" bool is a very common pattern. It is ok to name it satisfied for it to be more meaningful, but comments like "NB: We check the satisfied flag directly ..." seem excessive, given this common pattern. IMHO, they get in the way of reading the code.
Epic: CRDB-55052
Part of: #158163
Release note: none
mmaprototype: extract analyzeFunc
This commit moves analyzeFunc out of finishInit into its own helper improve
readability.
mmaprototype: extract doneFunc
This commit moves doneFunc out of analyzeFunc into its own helper function to
improve readability.
mmaprototype: add more comments for finishInit
This commit adds more comments for rangeAnalyzedConstraints.finishInit.
mmaprototype: extract diversity score computation
This commit moves diversity score computation out of analyzeFunc into its own
helper function to improve readability.